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Abstract —This paper presents a novel method to synthesize 
stochastic control Lyapnnov functions for a class of nonlinear, 
stochastic control systems. In this work, the classical nonUn- 
ear Hamilton-Jacohi-Bellman partial differential equation is 
transformed into a linear partial differential equation for a 
class of systems with a particniar constraint on the stochastic 
disturbance. It is shown that this linear partial differential equa¬ 
tion can he relaxed to a linear differential inclusion, allowing 
for approximating polynomial sointions to he generated using 
sum of squares programming. It is shown that the resulting 
solutions are stochastic control Lyapunov fnnctions with a 
number of compelling properties. In particular, a-priori bounds 
on trajectory suboptimality are shown for these approximate 
valne functions. The result is a technique whereby approximate 
solutions may be computed with non-increasing error via a 
hierarchy of semidefinite optimization problems. 

I. INTRODUCTION 

The stabilization of nonlinear systems is a central problem 
in control engineering. Lyapunov theory, wherein an energy¬ 
like function is used to show that some measure of distance 
from a stability point decays over time, is a critical tool for 
studying the convergence properties of a given system. Lya¬ 
punov theory may be generalized from analysis to synthesis 
of control systems using Control Lyapunov Function (CLF) 
[1]. However, the synthesis of a CLF for general systems 
remains a challenging open question, due to the bilinearity 
between the Lyapunov function and control input in the 
Lyapunov equation. 

A complementary and related domain in control engi¬ 
neering is the study of the Hamilton-Jacobi-Bellman (HJB) 
equation, a partial differential equaiton that governs the 
optimal control of a system. Methods to calculate the solution 
to the HJB equation via semidefinite programming have been 
proposed previously by Lasserre et al. [2]. In this work, 
we propose an alternative line of study based on the linear 
structure of a particular form of the HJB equation. Since 
the late 1970s, researchers [3]-[6] have made connections 
between stochastic optimal control and reaction-diffusion 
equation through a logarithmic transformation. This line of 
research has recently been the subject of focused study 
by Kappen [7] and Todorov [8]. These results have been 
developed in a number of compelling directions [9]-[13]. 

This paper combines these previously disparate helds of 
dynamic programming and Lyapunov theory by considering 
the value function, the solution to a stochastic HJB equation, 
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as a Stochastic CLF (SCLF). The HJB solution is global, in 
that it incorporates all potential initial system states, and op¬ 
timal. Here, we propose polynomial candidate approximate 
solutions to the HJB, extending recently developed tools in 
polynomial optimization to a new class of problems. It is 
already known that the solution to the deterministic HJB is 
in fact a CLF [14]. This paper shows that our approximated 
value function solutions are SCLFs as well. 

A preliminary version of this work appeared in [15] and 
[16], where the use of semidehnite relaxations for solving 
the HJB were hrst considered. However, the stabilization 
properties of the resulting solutions were not investigated. 
Instead, these previous works focused on HJB solutions for 
path planning problems, and did not have guarantees on 
trajectory performance when using approximate solutions to 
the HJB. 

The rest of this paper is organized as follows. Section |II] 
reviews the linearly solvable HJB equations, control Lya¬ 
punov functions, and sum of squares programming. Section 
Uni introduces a relaxed formulation of the HJB solutions 
which is efficiently computable using the sum of squares 
methodology. Section |IV] analyzes the properties of the 
relaxed solutions, such as approximation errors relative to 
the exact solutions. This section also shows that the relaxed 
solutions are SCLFs, and that the resulting controller is 
stabilizing. An example is presented in Section|V]to illustrate 
the optimization technique and its performance. Section |Vl] 
summarizes the hndings of this work and discusses future 
research directions. 

H. BACKGROUND 

This section briefly describes the notation and reviews 
necessary background on the linear HJB equation, SCLF, 
and SOS programming. 

A. Notation 

Table U summarizes the notation of different sets used in 
this work. A point on a trajectory, x{t) C K", at time t 
is denoted Xt, while the segment of this trajectory over the 
interval [t,T] is denoted by 

A compact domain in K" is denoted as fl where H C M", 
and its boundary is denoted as dfl. A domain fl is a basic 
closed semialgebraic set if there exists gtix) G K[x] for i = 
1, 2,..., TO such that H = {x | pijx) > 0 Vz = 1, 2,..., to}. 

Given a polynomial p{x), p{x) is positive on domain H 
if p{x) > 0 Vx G H, p{x) is nonnegative on domain fl if 
p(x) > 0 Vx G H, and p{x) is positive dehnite on domain H 
where 0 G H, if p(0) = 0 and p(x) > 0 for all x G H\{0}. 
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TABLE I 
Set notation 


Notation 

Definition 

z+ 

All positive integers 

R 

All real numbers 

R+ 

All nonnegative real numbers 

R" 

All n-dimensional real vectors 

R[3j] 

All real polynomial functions in x 

X m 

All n X m real matrices 

Rnxm[^] 

All M S such that Mij S R[x] V i,j 

K 

All continuous nondecreasing functions : R+ R+ 

such that /J.(0) = 0, ^{r) > 0 if r > 0, and ^{r) > ^{r') 
if r > r' 

^k,k' 

All functions / such that / is fc-differentiable with respect 
to the first argument and /c'-differentiable with respect to 
the second argument 


If it exists, the infinity norm of a function is defined as 
ll/lloo = a; G n. To improve readability, 

a function, f{xi,...,Xn), is abbreviated as / when the 
arguments of the function are clear from the context. 


B. Linear Hamilton-Jacobi-Bellman (HJB) Equation 

Consider the following affine nonlinear dynamical system, 

dxt = {f{xt) + G{xt)ut) dt + B{xt) dut (1) 


where xt S is the state at time f in a compact state 
space domain LI C M", Ut G ffi™ is the control input, 
f(x) G K"[x], G(x) G B(x) G are real 

polynomial functions of the state variables x, and ojt G 
is a vector consisting of Brownian motions with covariance 
Eg, i.e., ujf has independent increments with ujI — uj\ ~ 
— s)), for A/” (p, tr^) a normal distribution. The 
domain LI is assumed to be a basic closed semialgebraic 
set defined as 17 = {cc | gi(x) G R[x],gi(a;) > 0 Vi = 
1,2,, to}. Without loss of generality, let 0 G 17 and x = 0 
be the equilibrium point, whereby f(0) = 0, 0(0) = 0 and 
5(0) = 0. 

The goal is to minimize the following functional. 


E^JJ(x,u)] = 


1 

(t>{xT)+J q{xt) +-uj Rutdt 


( 2 ) 


subject to Q, where </> G K[x], cj) ■. LI ^ K+ represents 
a state-dependent terminal cost, q G K[x], q : 17 —>■ IR_|_ is 
state dependent cost, and R G is a positive definite 

matrix. T, unknown a priori, is the time at which the system 
reaches the domain boundary or the origin. This problem is 
generally called the first exit problem. The expectation E^^ 
is taken over all realizations of the noise w*. For stability 
of the resultant controller to the origin, q and are also 
required to be positive definite functions. The solution to 
this minimization problem is known as the value function, 
L : 17 — K+, where beginning from an initial point xt at 
time t 


Based on dynamic programming arguments [17, Ch. III.7], 
the HJB equation associated with this problem is a nonlinear, 
second order partial differential equation (PDE) 

0 = 9 + i {W^Vf GR-^G^ (V,L) 

+ ^Tr((V,,L)5E,5^) (4) 

with boundary condition V{x) = (j){x) and the optimal 
control effort takes the form 

u* = -R-^G'^V,,V. (5) 

For the stabilization problem on a compact domain, it is 
appropriate to set the boundary condition to be fix) = 0 
for X = 0, indicating zero cost accrued for achieving the 
origin, and (j){x) > 0 for x G dLl \ {0}. In practice, (/>(x) at 
the exterior boundary is usually chosen to be a large number 
depending on the applications to impose large penalty for 
exiting the predefined domain. 

In general, 0 is difficult to solve due to its nonlinearity. 
However, with the assumption that there exists a A > 0 and 
a control penalty cost i? in (|^ satisfying 

AG(xt)R-^G(xtf = B(xt)S,B(xtf ^ S(xt) = St, 

(6) 

and using the logarithmic transformation 

l/ = -AlogT', (7) 

it is possible [7], [8], after substitution and simplification, to 
obtain the following linear PDE from Q; 

0 = -l9^ + /^(V,T') + irr((V,,^)Et) x G 17 

4>(x) 

T'(x) = e 5^ X G dLl. (8) 

This transformation of the value function has been deemed 
the desirability function [8]. For brevity, define the following 
expression 

£(4-) ^ /^(V,T-) + ^Tr ((V,,^) E*) 
and the function ijj{x) at the boundary as 
ip{x) = X G dLl. 

Condition (|^ restricts the design of the control penalty R, 
such that control effort is highly penalized in subspaces with 
little noise, and lightly penalized in those with high noise. 
A specific case for which this condition is satisfied is for 
systems in which B(xt) = G{xt). Additional discussion is 
given in [8]. 

C. Stochastic Control Lyapunov Functions (SCLF) 

Before the stochastic control Lyapunov function (SCLF) 
is introduced, the definitions for two forms of stability are 
provided, following the definitions in [18, Ch. 5]. 

Definition 1. Given Q. the equilibrium point at x = 0 is 
stable in probability for t > 0 if for any s > 0 and e > 0, 


V (xt) = min E^,;^ [j (xp^r], U[t,T])] • (3) 


lim P 

x—^0 


sup \X^ 

t>s 


^it)\ > 


= 0 
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where is the trajectory of Q starting from x at time 

s. 

Intuitively, Definition [T] is similar to the notion of stability 
for deterministic systems. The following is a stronger sta¬ 
bility definition that is similar to the notion of asymptotic 
stability for deterministic systems. 

Definition 2. Given ([T]), the equilibrium point at x = 0 
is asymptotically stable in probability if it is stable in 
probability and 


The set of SOS polynomials in x is denoted as S[a;]. 

A sufficient condition for non-negativity of a polynomial 
f{x) is that f{x) G S[a:]. This seemingly simple fact is 
compelling, as testing the membership of a polynomial in 
S[a;] may be performed as a convex problem [19]. 

Theorem 7. jJ9, Thm. 3.3] The existence of a SOS 
decomposition of a polynomial in n variables of degree 2d 
can be decided by solving a semidefinite programming (SDP) 
feasibility problem. 


lim P( lim = O) = 1 

X—>-0 Kt—foc) J 

where X^'^ is the trajectory of Q starting from x at time 

s. 


For stochastic systems, the SCLF and Lyapunov theorems 
are defined as follows. 

Definition 3. A stochastic control Lyapunov function (SCLF) 
for system Q is a positive definite function V G on a 
compact domain O = LlU {0} x {f > 0} such that 

v(o,f) = o, v{x,t) > p,{\x\) yt 

3 u{x,t) s.t. L{V{x,t)) <0 y {x,t) G 0\{{0,t)} 

where p, G 1C, and 

L(V) = dtV+X,V^if+Gu) + ^Tr((V,,V)BE,B^). 

(9) 

Theorem 4. [18, Thm. 5.3] For system Q. assume that 

there exists a SCLF and a u defined in Definition Then, 
the equilibrium point x = 0 is stable in probability, and u 
is a stabilizing controller. 

To achieve the stronger condition of asymptotic stability 
in probability, we have the following result. 

Theorem 5. [18, Thm. 5.5 and Cor. 5.1] For system Q. 
suppose that in addition to the existence of a SCLF and a u 
defined in Definition u is time-invariant, 

V{x, t) < p'{\x\) V t 
L{V{x,t)) <0 V (x,t) G C>\{(0,f)} 

where p' G 1C. Then, the equilibrium point x = 0 is asymp¬ 
totically stable in probability, and u is an asymptotically 
stabilizing controller. 


D. Sum of Squares (SOS) Programming 

This section provides a brief review of SOS programming, 
the tool by which we will use to generate approximate 
solutions to the HJB equation. A complete introduction to 
the subject of SOS programming is available in [19]. 

Definition 6. A multivariate polynomial f{x) is a sum of 
squares (SOS) if there exist polynomials fQ{x),...,fmix) 
such that 

m 

i=0 


Hence, by adding SOS constraints to the set of all positive 
polynomials, testing nonnegativity of a polynomial becomes 
a tractable SDP problem. The converse question, is a nonneg¬ 
ative polynomial necessarily a SOS, is unfortunately false, 
indicating that this test is conservative [19]. Nonetheless, 
SOS feasibility is sufficiently powerful for our purposes. 

Theorem guarantees a tractable procedure to determine 
whether a particular polynomial, possibly parameterized, is a 
SOS polynomial. Our method combines multiple polynomial 
constraints into an optimization formulation. To do so, we 
need to define the following polynomial set. 

Definition 8. The preordering of polynomials gi{x) G M[a;] 
for i = 1,2,... ,m is the set 

^ s^{x)gi{x)''^ ■ ■ ■ gmix)'"' 

(10) 

The following proposition is useful to incorporate the 
domain Cl in our optimization formulation later. 

Proposition 9. Given f(x) G K[a;], if f(x) G P{gi, ■. ■, g-m), 
on the domain H = {x | gi{x) G > 0,f € 

{1, 2,..., to}}, then f{x) is nonnegative on Cl. If there exists 
another polynomial f'{x) such that f'(x) > f(x), then f'{x) 
is also nonnegative on Cl. 

To illustrate how this proposition applies, consider a 
polynomial f(x) on a domain defined hy x G [—1,1]. The 
bounded domain can be equivalently defined by polynomials 
gi(x) = 1 -F a: and g 2 {x) = 1 — a;. To certify that 
f{x) > 0 on the specified domain, constmct a function 
h{x) = si(a;)(l -I- x) -F S 2 (x)(l — x) -F S 3 (x)(l -F x)(l — x) 
where Si G S[x] and certify that /(x) — h(x) > 0. 
Notice that h(x) G P(1 -F x, 1 — x), so h(x) > 0. If 
/(x) — h(x) > 0, then /(x) > h(x) > 0. Proposition 
1^ is applied here. Finding the correct Si(x) is not trivial 
in general. Nonetheless, as mentioned earlier, if we further 
impose that f(x) — h(x) G S[x], then checking if there exists 
Si(x) such that /(x) — h(x) G S[x] becomes a semidefinite 
feasibility program as given by Theorem |7] More concretely, 
the procedure may begin with a limited polynomial degree 
for Si(x), increasing the degree until a certificate is found 
(if one exists) or the computation resources are exhausted. 

To simplify notation in later text, given a domain Cl = 
{x I gi{x) G R[x], (jfi(x) > 0, i G {1, 2,..., to}}, we set the 
notation P{Cl) = P{gi,.. .,gm)- 
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III. Sum-of-Squares Relaxation oe the HJB PDE 

This section demonstrates how SOS programming can be 
used to solve the linear HJB via an SOS relaxation. We would 
like to emphasize the following standing assumption, typical 
of moment and SOS-based methods [2], [19]. 

Assumption 10. Assume that system 0 evolves on a 
compact domain O C M" that is also a basic closed 
semialgebraic set such that O = {a; | gi{x) S K[a;],pi(x) > 
0, i S {1,..., fc}} for some k > 1. Then, the boundary dfl 
is polynomial representable. We use the notation dfl = {x \ 
hi(x) G K[2;], Jli^i ^i(^) = 0}/or some m>l to describe 
this boundary. 

The following definitions formalize several operators that 
will be useful in later text. 

Definition 11. Given a basic closed semialgebraic set = 
{x I gi(x) G M[x], 5 i(x) > 0,i G {1,..., k}} and a set of 
SOS polynomials, S = {si/(x) | s^ix) G S[x],i^ G {0,1}^'}, 
define the operator T) as 

'D{n,S)= ^ s„{x)gi{x)’'^ ■ ■ ■ gkix)'''^ 

where 22(0,5) G T’(O). 

Definition 12. Given a polynomial inequality, p{x) > 0, 
the boundary of a compact set 30 = {x | hi{x) G 
K[x], ut _^hi{x) = 0} and a set of polynomials, T = 
{ti{x) I ti{x) G K[x],i G {1,...,m}}, define the operator 
B as 

5(p(x),30,T) = {p(x) -ti{x)hi{x) I i G {l,...,m}} 

where B returns a set of polynomials that is nonnegative on 
30. 

A. Relaxation of the HJB equation 

For the remainder of this paper, we assume that the unique 
solution to Q and ([^ exists in the viscosity solutions sense 
(see [17], Chapter V) and denote the unique solutions as V* 
and T'* respectively. 

The equality constraints of ([^ may be relaxed (in either 
direction) as follows 

jg'^>-£(•!>) < (>)0 

X S 30. (11) 

This relaxation provides a point-wise bound to the true 
solution, and it may be enforced via SOS programming. In 
particular, a solution to ( [TT] i, denoted as T'/ (4'„), is a lower 
(upper) bound on the solution T** over the domain O. 

Proposition 13. Given a smooth function T'; ('i’u) that 
satisfies o, then T'i ('i’u) B a viscosity subsolution (su¬ 
persolution) and T'i < T'* (T'„ > T'*)/or all x G £1. 

Proof. By [20, Def. 2.2], the solution T'/ is a viscosity 
subsolution. Note that T'* is both a viscosity subsolution and 
a viscosity supersolution, and T'/ < T** on the boundary dVl. 
Hence, by the maximum principle for viscosity solutions [20, 


Thm 3.3], 'I'i < T'* for all x G £t. Similar argument applies 
for □ 

Because the logarithmic transform Q is monotonic, one 
can relate these bounds on the desirability function to bounds 
on the value function as follows: 

Proposition 14. If the solution to 0 is V*, given solutions 
Vu = — AlogT*/ and Vi = — AlogT'„ from ( fTT] !, then 14 > 
F* and Vi < F*. 


B. Controller Synthesis 

Given that relaxation ( [TT] i results in a point-wise upper and 
lower bound to the exact solution of 0, we construct the 
following optimization that provides a suboptimal controller 
with bounded residual error: 


mm e 


s.t. ^qA>i - <0 

A 

X G H 

0 < - £(«'„) 

X G H 

- T-i < e 

X G H 

0 <A>i <jP 

X G 3H 

< 0 

X* > 0 

'0 

II IV 
0 

X* < 0 


( 12 ) 


where x* is the i-th component of x S H. As mentioned 
in Section |III-A| the first two constraints result from the 
relaxations of the HJB equation, and the fourth constraint 
arises from the relaxation of the boundary conditions. The 
third constraint ensures that the solution error is bounded by 
e, and the last three constraints ensure that the solution yields 


a stabilizing controller, as will be made clear in Section IV 


In order to solve as a semidefinite optimization 

problem, we restrict the polynomial inequalities such that 
they are SOS polynomials instead of nonnegative polynomi¬ 
als. Therefore, after applying Proposition to the domain 
constraints, the resulting optimization is 


min e (13) 

s.t. jq^iCiAii)-'D(n,Si) gS[x] 

- Ci'^u) -V(n,S 2 ) GS[x] 

e-{^^>^-A>l)-Vin,S3)GS[x] 
B{A>i,dn,ri) gs[x] 

5(i/>-^/,3H,r2) gS[x] 
5(T',-^,3H,r3) gS[x] 
-d^^A>i-V(nn{x^ >0},54) gS[x] 

3a,i'!'/ — I2(H n {—X* > 0}, 55 ) G S[x] 
^/(O) = 1 


where S = 
Definition 11 


(5i,... , 54 , 55 ), Si C S[x] is defined as in 
T = ( 71 , 72 , 73 ), and 75 C K[x] is defined as 
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in Definition 12 With a slight abuse of notation, B{-) G S[a:] 
implies that each polynomial in B{-) is a SOS polynomial. 

If the degrees of polynomials are hxed, optimization ( [T3] ) 
is convex and may be solved as an SDP via Theorem The 
next section will discuss the systematic approach we used to 
solve the optimization. 


Remark 15. By definition, the viscosity solution is a contin¬ 
uous function [20, Def. 2.2], Consequently, the solution T'* is 
a continuous function defined on a bounded domain. Hence, 
T'u and T'; can be made arbitrary close to T'* by the Stone- 
Weierstrass Theorem [21 [ in However, this guarantee is 
lost when and T'/ are restricted to be SOS polynomials. 
The feasible set of the optimization problem 01 is therefore 
not necessarily non-empty for a given polynomial degree. 


Proposition 17. The hierarchy of SOS programs consisting 
of solutions to 01 with increasing polynomial degree pro¬ 
duces a sequence of solutions 'I'f, S‘^, such that 

Proof. Polynomials of degree d form a subset of polynomials 
of degree d + 1. Thus, at a higher polynomial degree d + 1, 
a previous solution at a lower polynomial degree d is still a 
feasible solution when the coefficients for monomials with 
total degree d+1 is set to 0. Consequently, the optimal value 
gd+i cannot be smaller than for all d. □ 

Although the bound on the pointwise error is non¬ 
increasing, the actual error may in fact increase between 
iterations. We bound this variation as follows. 


C. Hierarchy of SOS programs 

Let d be the maximum degree of T';, and polynomials 
in S and T, and denote e'^) as a solution 

to ([T3) when the maximum polynomial degree is hxed at d. 
The hierarchy of SOS programs with increasing polynomial 
degree produces a sequence of possibly empty solutions 
where / C This sequence will 
be shown in the next section to improve, under the metric of 
the objective in ([T^. The use of such hierarchies has become 
common in polynomial optimization [19], [22]. Once a 
satisfactory error is achieved or computational resources run 
out, the lower bound T'; is used to compute the suboptimal 
controller. The suboptimal controller for a given error e is 
computed as u'^ = —R~^G^S7 where Vu = — AlogT*;. 
The next section will analyze the properties of the solutions 
and the suboptimal controller. 

IV. ANALYSIS 


Corollary 18. Suppose ||'I'‘^ — 'P*||oo < and — 

= 7^+1. Then, < d^. 


Proof. From Proposition 


17 


ryd+l 


d-l-1 


□ 


Note that e is only non-increasing as polynomial degree 
increases. Therefore, Proposition 17 and Corollary 18 does 
not guarantee a convergence of e to zero. 


B. Properties of the Approximated Value Function 

We now investigate the implications of Corollary [TSl upon 
the value function. Henceforth, denote the solution to 0 as 
V*{xt) = min„[t.T] Ea;Jd(a;t)] = -AlogT'*(a:t), and the 
suboptimal value function computed from the solution of 
( fT3l l as 14 = -AlogT';. 

Theorem 19. I 4 is an upper bound of the optimal cost V* 
such that 


This section establishes appealing properties of the so¬ 
lutions to the optimization ([T^ that are relevant for feed¬ 
back control. First, we show that the solutions in the SOS 
program hierarchy are uniformly bounded relative to the 
exact solutions. We next prove that the solutions to the 
relaxed stochastic HJB equation are SCLFs, and they yield 
stabilizing controllers. Finally, we show that the costs of 
using the approximate solutions as controllers are bounded 
above by the approximated value functions. 


0 < 14 — 1^* < — Alog — min ll, (14) 

where rj = e ^ 

Proof. By Proposition [T?} 14 > C* and hence, 14 —V* > 0. 
To prove the other inequality, by Proposition [T^ 

T'; T'* — e / e A 

K - F* = -Alog ^ < -Alog < -Alog (^1 - ^ j 


A. Properties of the Approximated Desirability Functions 

First, compute the approximation error of the true desir¬ 
ability function T*; or T't, obtained from optimization ( fO] ). 

Proposition 16. Given a solution 'F;, 5, T, e) to •El 
for a fixed degree d, the approximation error of a desirability 
function is bounded as — 4'*||oo < e where T' is either 
A’u or 


Proof. By Corollary 13 4'; is the lower bound of T'*, and 
is the upper bound of T'*. So, e > T'u—T'; > 0 and > 
> 4/;. Combining both inequalities, one has —IF* < e 
and T** — T*; < e. Therefore, ||4' — T'*||oo < e where T' is 
either T'u or 4';. □ 


The last inequality holds because T** > e a ' " by 
definition in 0. Since T*/ is the lower bound of T'*, the right 
hand side of the hrst equality is always a positive number. 
Therefore, I 4 is a point-wise upper bound of 4*. □ 

Corollary 20. Let Vf} = -AlogT'f and = 

-AlogT-f+A If -V* < e‘^ and V[}+^ - V* = 
then 7 '^+^ < —Alog ^1 — min |l, 

At this point, we have shown that the lower bound of the 
desirability function gives an upper bound of the suboptimal 
cost. More importantly, the upper bound of the suboptimal 
cost is non-increasing as the polynomial degree increases. 
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C. The Exact and Approximate HJB solutions are SCLFs 

Here, we show that the approximate value function derived 
from the lower desirability approximation, 'I';, is a SCLF. 

Theorem 21. Vu is a stochastic control Lyapunov function 
according to Definition^ 

Proof. The constraint T'i(O) = 1 ensures that 14(0) = 
—AlogT'i(O) = 0. Notice that all terms in J{x,u) from (|^ 
are positive dehnite, resulting in V* being a positive dehnite 
function. In addition, by Proposition [l4| > V*. Hence, 

1^“ is also a positive dehnite function. The second and third 
to last constraints in ( [T3| l ensures that T*; is nonincreasing. 
Hence, 14 is nondecreasing satisfying /i(|a;|) < Vu{x) < 
p'{\x\) for some p,,p,' G K.. 

Next, show that there exists a u such that LiVu) < 0. 
Following (j^, let 

. (15) 

Notice that from the dehnition of 14, Va;14 = — 
and VxxK = So, u = 

Then, from 

L{Vu) = 

+ \Tr 

where 9(14 = 0 because 14 is not a function of time. 
Applying the assumption in ^ and simplifying, 

HVu) = 

From the hrst constraint in ( fO] ), 

\q-^i - fi^x'fi) - \Tr S() < 0 

-^(V.'I'z)^/ <-q+ ^Tr ((V,,T-() S*) . 

Substituting this inequality into L(14) and simplifying yields 

L{Vu) <-q- < 0 (16) 

because g > 0, A > 0 and S( is positive semidehnite by 
dehnition. Since 14 satishes Dehnition14 is a SCLF. □ 


Corollary 22. The suboptimal controller u'^ = 

—R~^G"^WxVu is stabilizing in probability within the 
domain Q. If E( is a positive definite matrix, the suboptimal 
controller u'^ = —i?“^G^Va;14 is asymptotically stabilizing 
in probability within the domain Q. 


Proof This corollary is a direct consequence of the con¬ 
structive proof of Theorem 21 and Theorem □ 


D. Bound on the Total Trajectory Cost 

We conclude this section by showing that the expected 
total trajectory cost incurred by the system while operating 
under the suboptimal controller of (15 i is bounded. 

Theorem 23. Given the control law = —i?“^G^Va;14, 
-/« < K < 1^* - Alog - min |l, (17) 

where Ju = Eojt [</>r(a;r) + /q^ r(x(, ul)dt], the expected cost 
of the system when using the given control law, u'^. 

Proof By Ito’s formula, 

dVu{xt) = L{Vu){xt)dt + V xVu{xt)B{xt)dujt. 

where L{V) is dehned in (|^. Then, 

14(a:() = 14(a;o,0)+ [ L{Vu){xs)ds 
Jo 

+ I ^xVu{xs)B{Xs)duJs. (18) 

Jo 

Take the expectation of this equation to get 


^LJtWuixt)] = 14(a;o,0) +E^ 


L{Vu){xs)ds 


Go 


whereby the last term of ( [T8| drops out because the noise is 
assumed to have zero mean. The expectations of the other 
terms return the same terms because they are deterministic. 
From ( [Thl l, 

L{Vu) < -g- 

= GR-^G'^ (V.K) 

= -q-\{u^fRu^ 

where the hrst equality is given by the logarithmic transfor¬ 
mation and the second equality is given by the control law 
= —i?“^G^Va;14- Therefore, 

rt 


Ea;J14(a;t)] = 14(a;o) + E^ 


< 14(a;o) - E^ 


L{Vu){xs) ds 


Go 


' r 1 

Jq ds 


= Vu{xo) - J{xo,u'') +E^J [4>{xt)] 

Therefore, 14 (j;o) - J{xo,u’^) > E^J14(a:() - fixt)]. 
By dehnition, Vu{xt) > 4‘{xt) for all xt G 

Thus, E^J14(a;7’) — f{xT)\ > 0. Consequently, 14(azo) — 


J{xo,u'^) > 0, and 14(a:o) > J{xo,u^). Lastly, Theorem 19 
gives the second inequality in the theorem. □ 


V. Numeric Examples 

This section studies the computational characteristics of 
our method using a scalar unstable system. The optimization 
parser YALMIP [23] was used in conjunction with the 
semidehnite optimization package MOSEK [24] to solve the 
optimization problem ([T3]l. 
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Fig. 1. The desirability function for varying polynomial 
degree. The true solution is the black curve. 


Consider the following unstable scalar nonlinear system 

dx = (—+ 5x^ + 3x + u) dt + doj (19) 

on the domain a; S 11 = {a; | —1 < x < 1}. The noise 
model considered is Gaussian white noise with zero mean 
and variance Sg = 1. The goal is to stabilize the system at the 
origin. Instead of zero, we choose the boundary at two ends 
of the domain to be '£'(—!) = 20e“^° and 'f'(l) = 20e“^°. 
At the origin, the boundary is set as ^'(0) = 1. We set 
q = x^, and i? = 1. Because of the natural division of the 
domain, the solutions for both domains can be represented by 
smooth polynomials respectively, and solved independently. 

The desirability functions that results from solving 0 
for varying polynomial degrees are shown in Figure [T] The 
optimization problem is not feasible for polynomial degree 
below 12. The true solution is computed using Mathematica. 
The kink at the origin is expected because the HJB PDF 
solution is not necessarily smooth at the boundary, and in 
this situation the origin is itself a boundary between the two 


domain halves. The approximation error e for both partitions 
is shown in Figure 2(a) for increasing polynomial degree. 
As seen in the plots, the approximation improves as the 
polynomial degree increases. 

To quantify the performance of the controller, a Monte 
Carlo experiment is performed. For each polynomial degree 
that is feasible, the controller obtained from T*/ in optimiza¬ 
tion ( [T3] i is implemented in 20 simulations of the system 
subject to random samples of Gaussian white noise with 
Eg = 1. The initial condition is fixed at xq = —0.5 and 
t = 0. The continuous system is integrated numerically 
using Euler integration with step size of 0.005s. The sim¬ 
ulation is terminated when the trajectories enter the interval 
[—0.005,0.005] centered on the origin. Figure 2(c) shows 
the comparison between J„(xo, t) and Vu{xo, t) for different 
polynomial degrees whereby J„ is the expected cost and 
Vu is the value function computed from ik/ in optimization 
([T3J. Figure |2(b)| illustrates several sample trajectories. In 
general, the trajectories converge earlier when the polynomial 
degree is higher. This observation is expected because the 
approximation error is smaller as the polynomial degree 
increases. 


VI. CONCLUSION 

This paper proposes a novel method to solve the linear 
Hamilton Jacobi Equation of an optimal control problem 
with nonlinear, stochastic systems dynamics via sum of 
squares programming. Analytical results provide guarantees 
on the suboptimality of trajectories when using the approx¬ 
imate solutions for controller design. Consequently, one can 
synthesize a suboptimal stabilizing controller to nonlinear, 
stochastic dynamical systems. 

To improve the algorithm, the monomials of the polyno¬ 
mial approximation can be chosen strategically in order to 
decrease computation time while achieving high accuracy. 
Thus, a promising future direction is the synthesis of the 
work presented here with that of [25], where HJB equations 
were solved in dimension twelve and higher. To improve 





(b) 


20 30 40 

Polynomial degree 

(c) 


Fig. 2. Computational results of system (a) Convergence of the objective function of as the degree of polynomial 
increases. The approximation error for x < 0 is denoted as e; and the approximation error for x > 0 is denoted as Cr- (b) 
Sample trajectories using controller computed from optimization problem d with different polynomial degrees starting 
from six randomly chosen initial points, (c) The comparison between and 14 for different polynomial degrees whereby 
Ju is the expected cost and 14 is the value function computed from optimization problem ( fT3] l. The initial condition is fixed 
at Xq = —0.5. 




























the numerical conditioning of these optimization techniques, 
other numerical schemes are also under investigation [16]. 

There remains the question of the limitations placed by 
the structural constraint (|^. A compelling research question 
is the suboptimality of controllers and trajectories when 
approximating systems that do not adhere to the constraint, 
such as deterministic systems or those with noise in states 
without a control channel. 
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